In Week 1, the Dallas Cowboys visited the New York Giants, an NFC East division rival, at MetLife Stadium to open the 2023 season. This Analysis focused on the offense formations the Cowboys ran against the Giants, how successful they were, and where on the field they ran the formations.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.patheffects as path_effects
from scipy.stats import iqr
# home-built scripts
from import_nfl_pbp import PlayByPlay as Play
from football_field_plot import create_football_field as Field
full_df = Play().retrieve_year(2023)
2023 done.
full_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2816 entries, 0 to 2815 Columns: 384 entries, play_id to n_defense dtypes: float64(202), int32(8), int64(1), object(173) memory usage: 8.2+ MB
This tells us that there are 2649 rows and 384 columns in the data. That is a lot of columns!! In the next step we will filter this data to what we need do study the Dallas offense formations.
dallas_df = full_df[(full_df.possession_team == 'DAL') &
(full_df.week == 1)][[
'game_seconds_remaining', # how many seconds are left in the game
'week', # the week being played
'possession_team', # the team in possession of the ball
'offense_formation', # the formation of the offense
'play_type', # what type of play (pass, throw, punt, etc)
'yards_gained', # how many yards were gained on the play
'yrdln' # the starting yardline of the play
]].reset_index(drop=True)
Now that we have a good dataset to work with let us go ahead and inspect the first few lines
dallas_df.head()
| game_seconds_remaining | week | possession_team | offense_formation | play_type | yards_gained | yrdln | |
|---|---|---|---|---|---|---|---|
| 0 | 3600.0 | 1 | DAL | None | kickoff | 0.0 | DAL 35 |
| 1 | 3183.0 | 1 | DAL | None | extra_point | 0.0 | NYG 15 |
| 2 | 3183.0 | 1 | DAL | None | kickoff | 0.0 | DAL 35 |
| 3 | 3126.0 | 1 | DAL | SHOTGUN | pass | 2.0 | DAL 26 |
| 4 | 3097.0 | 1 | DAL | SINGLEBACK | run | 4.0 | DAL 28 |
What was the frequency of each formation type?
dallas_df.offense_formation.value_counts()
SINGLEBACK 24 SHOTGUN 22 I_FORM 5 EMPTY 4 JUMBO 2 PISTOL 1 Name: offense_formation, dtype: int64
sns.displot(dallas_df, x="yards_gained", hue="offense_formation", element="step", multiple="stack")
plt.title("Offense Formation Frequency", fontsize=24, loc='center')
plt.ylabel("Count", fontsize=16)
plt.xlabel("Offense Formation", fontsize=16)
Text(0.5, 9.444444444444438, 'Offense Formation')
sns.boxplot(data=dallas_df, x="yards_gained", y="offense_formation", hue="offense_formation", dodge=False)
plt.title("Yards Gained by Formation", fontsize=24)
plt.ylabel("Offense Formation", fontsize=16)
plt.xlabel("Yards Gained", fontsize=16)
Text(0.5, 0, 'Yards Gained')
An interesting point is that the data is skew-right with outliers in both the single back and shotgun formations.
The outliers were good for Dallas! Below is an image for each of the formations for an easier mental picture of what the formations look like on TV.
There are a lot of factors that could go into providing answers to this question, however, to keep this analysis brief I will rank the formations by how many yards they produced. The metric I used is Interquartile Range (IQR) which is the length of the box on the boxplot above or the 75th - 25th percentile. The reason - there are several outliers that make the average yards gained a poor measure of dispersion.
for formation in dallas_df.offense_formation.unique():
if formation is not None:
formation_filter = dallas_df[dallas_df.offense_formation == formation]['yards_gained']
IQR = np.round(np.percentile(formation_filter, 75) - np.percentile(formation_filter, 25), 1)
average = np.round(np.mean(formation_filter), 1)
median = np.round(np.median(formation_filter), 1)
print(f"{formation} has an IQR of {IQR}, an Average of {average}, and a Median of {median} yards")
SHOTGUN has an IQR of 6.0, an Average of 4.9, and a Median of 2.0 yards SINGLEBACK has an IQR of 4.8, an Average of 5.2, and a Median of 3.0 yards I_FORM has an IQR of 2.0, an Average of 1.6, and a Median of 2.0 yards EMPTY has an IQR of 3.0, an Average of 5.5, and a Median of 6.5 yards JUMBO has an IQR of 0.5, an Average of 0.5, and a Median of 0.5 yards PISTOL has an IQR of 0.0, an Average of 0.0, and a Median of 0.0 yards
The Shotgun Formation was the most successfull! (6.0 yards IQR)
Followed by the Single Back Formation (4.8 yards IQR)
There may be a reason why these two happened to be the top two most frequent formations...
dallas_df.offense_formation.value_counts()
SINGLEBACK 24 SHOTGUN 22 I_FORM 5 EMPTY 4 JUMBO 2 PISTOL 1 Name: offense_formation, dtype: int64
First the yardlines need to be extracted from the yrdln column and the data needs to be saved off for each formation. I chose to use a dictionary where the keys are the names of the formations and the values are a filtered and parsed Pandas DataFrame
def clean_formation_data():
cleaned_data = dict()
for formation in dallas_df.offense_formation.unique():
if formation is not None:
# There is a None type that we don't want to mess with
formation_filter = dallas_df[dallas_df.offense_formation == formation][[
'offense_formation',
'yards_gained',
'yrdln'
]].reset_index(drop=True)
# Save the team name to a new column
formation_filter['team_name'] = [item.split(' ')[0] for item in formation_filter.yrdln]
# Save the yard line values to a new integer column
formation_filter['yard_line'] = [item.split(' ')[-1] for item in formation_filter.yrdln]
formation_filter.yard_line = formation_filter.yard_line.astype(int)
# Adjust the yard line for plotting purposes. The giants are on the left side so they only
# Need 10 yards to compensate for the endzone. The Cowboys are on the right side so they
# Need their values subtracted from 110 yards
formation_filter['adjusted_yard_line'] = [
(formation_filter.yard_line[x] + 10) if (formation_filter.team_name[x]=='NYG') else
(110 - formation_filter.yard_line[x]) for x in range(len(formation_filter.team_name))]
# Update the dictionary with the filtered and parsed DataFrame
cleaned_data.update({formation:formation_filter})
return cleaned_data
Verify that the returned data is the expected result
df = clean_formation_data()
df.get('SHOTGUN').head(2)
| offense_formation | yards_gained | yrdln | team_name | yard_line | adjusted_yard_line | |
|---|---|---|---|---|---|---|
| 0 | SHOTGUN | 2.0 | DAL 26 | DAL | 26 | 84 |
| 1 | SHOTGUN | 49.0 | DAL 32 | DAL | 32 | 78 |
The data looks as expected and the adjusted yardline for plotting looks correct!
fig, ax = Field()
text = ax.text(112, 13, 'Cowboys', color='white', fontsize=36, rotation=270)
text.set_path_effects([path_effects.Stroke(linewidth=3,
foreground='black'),
path_effects.Normal()])
text = ax.text(3, 18, 'GIANTS', color='white', fontsize=30, rotation=90)
text.set_path_effects([path_effects.Stroke(linewidth=3,
foreground='black'),
path_effects.Normal()])
scatter1 = plt.scatter(df.get('SINGLEBACK').adjusted_yard_line, np.repeat(10, len(df.get('SINGLEBACK').adjusted_yard_line)),
linewidths=1,
label='SINGLEBACK',
s=200,
marker='<'
)
scatter2 = plt.scatter(df.get('SHOTGUN').adjusted_yard_line, np.repeat(16.7, len(df.get('SHOTGUN').adjusted_yard_line)),
linewidths=1,
label='SHOTGUN',
s=200,
marker='<'
)
scatter3 = plt.scatter(df.get('I_FORM').adjusted_yard_line, np.repeat(23.3, len(df.get('I_FORM').adjusted_yard_line)),
linewidths=1,
label='I_FORM',
s=200,
marker='<'
)
scatter4 = plt.scatter(df.get('EMPTY').adjusted_yard_line, np.repeat(30, len(df.get('EMPTY').adjusted_yard_line)),
linewidths=1,
label='EMPTY',
s=200,
marker='<'
)
scatter5 = plt.scatter(df.get('JUMBO').adjusted_yard_line, np.repeat(36.6, len(df.get('JUMBO').adjusted_yard_line)),
linewidths=1,
label='JUMBO',
s=200,
marker='<'
)
scatter6 = plt.scatter(df.get('PISTOL').adjusted_yard_line, np.repeat(43.3, len(df.get('PISTOL').adjusted_yard_line)),
linewidths=1,
label='PISTOL',
s=200,
marker='<'
)
ax.legend(handles=[scatter6, scatter5, scatter4, scatter3, scatter2, scatter1],
loc=1, framealpha=0.95,
bbox_to_anchor=(0.925, 0.933),
labelspacing=1.0
)
plt.title("Cowboys Offense Formations", fontsize=30)
plt.show()
Note: the vertical spacing does not indicate the play occured on that part of the field. They are seperated vertically to avoid overlap
This brief study shows that Dallas Cowboys ran six offense formations with the